Search CORE

8 research outputs found

Adaptive multi-fidelity optimization with fast learning rates

Author: Fiegel Côme
Gabillon Victor
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

International audienceIn multi-fidelity optimization, we have access to biased approximations of varying costs of the target function. In this work, we study the setting of optimizing a locally smooth function with a limited budget Λ, where the learner has to make a trade-off between the cost and the bias of these approximations. We first prove lower bounds for the simple regret under different assumptions on the fidelities, based on a cost-to-bias function. We then present the Kometo algorithm which achieves, with additional logarithmic factors, the same rates without any knowledge of the function smoothness and fidelity assumptions and improving prior results. Finally, we empirically show that our algorithm outperforms prior multi-fidelity optimization methods without the knowledge of problem-dependent parameters

INRIA a CCSD electronic archive server

Adapting to game trees in zero-sum imperfect information games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue
Publication date: 23/12/2022
Field of study

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn

\epsilon

-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound

\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

on the required number of realizations to learn these strategies with high probability, where

H

is the length of the game,

A_{\mathcal{X}}

and

B_{\mathcal{Y}}

are the total number of actions for the two players. We also propose two Follow the Regularize leader (FTRL) algorithms for this setting: Balanced-FTRL which matches this lower bound, but requires the knowledge of the information set structure beforehand to define the regularization; and Adaptive-FTRL which needs

\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

plays without this requirement by progressively adapting the regularization to the observations

arXiv.org e-Print Archive

HAL-ENS-LYON

INRIA a CCSD electronic archive server

HAL-Polytechnique

Local and adaptive mirror descents in extensive-form games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue
Publication date: 01/09/2023
Field of study

We study how to learn

\epsilon

-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by

T

. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of

\tilde{\mathcal{O}}(T^{-1/2})

with high probability and has a near-optimal dependence on the game parameters when applied with the best theoretical choices of learning rates and sampling policies. To achieve these results, we generalize the notion of OMD stabilization, allowing for time-varying regularization with convex increments

arXiv.org e-Print Archive

Adaptive multi-fidelity optimization with fast learning rates

Author: Fiegel Côme
Gabillon Victor
Valko Michal
Publication venue: HAL CCSD
Publication date: 01/01/2020
Field of study

INRIA a CCSD electronic archive server

HAL Descartes

Hal-Diderot

Local and adaptive mirror descents in extensive-form games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue: HAL CCSD
Publication date: 23/07/2023
Field of study

International audienceWe study how to learn ε-optimal strategies in zero-sum imperfect information games (IIG) with trajectory feedback. In this setting, players update their policies sequentially based on their observations over a fixed number of episodes, denoted by T. Existing procedures suffer from high variance due to the use of importance sampling over sequences of actions (Steinberger et al., 2020; McAleer et al., 2022). To reduce this variance, we consider a fixed sampling approach, where players still update their policies over time, but with observations obtained through a given fixed sampling policy. Our approach is based on an adaptive Online Mirror Descent (OMD) algorithm that applies OMD locally to each information set, using individually decreasing learning rates and a regularized loss. We show that this approach guarantees a convergence rate of Õ(T-1/2) with high probability and has a near-optimal dependence on the game parameters when applied with the best theoretical choices of learning rates and sampling policies. To achieve these results, we generalize the notion of OMD stabilization, allowing for time-varying regularization with convex increments

HAL-Polytechnique

Adapting to game trees in zero-sum imperfect information games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue: HAL CCSD
Publication date: 16/01/2023
Field of study

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn

\epsilon

-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound

\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

on the required number of realizations to learn these strategies with high probability, where

H

is the length of the game,

A_{\mathcal{X}}

and

B_{\mathcal{Y}}

\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

plays without this requirement by progressively adapting the regularization to the observations

INRIA a CCSD electronic archive server

Adapting to game trees in zero-sum imperfect information games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue: HAL CCSD
Publication date: 16/01/2023
Field of study

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn

\epsilon

-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound

\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

on the required number of realizations to learn these strategies with high probability, where

H

is the length of the game,

A_{\mathcal{X}}

and

B_{\mathcal{Y}}

\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

plays without this requirement by progressively adapting the regularization to the observations

HAL-Polytechnique

Adapting to game trees in zero-sum imperfect information games

Author: Fiegel Côme
Kozuno Tadashi
Munos Rémi
Ménard Pierre
Perchet Vianney
Valko Michal
Publication venue: HAL CCSD
Publication date: 16/01/2023
Field of study

Imperfect information games (IIG) are games in which each player only partially observes the current game state. We study how to learn

\epsilon

-optimal strategies in a zero-sum IIG through self-play with trajectory feedback. We give a problem-independent lower bound

\mathcal{O}(H(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

on the required number of realizations to learn these strategies with high probability, where

H

is the length of the game,

A_{\mathcal{X}}

and

B_{\mathcal{Y}}

\mathcal{O}(H^2(A_{\mathcal{X}}+B_{\mathcal{Y}})/\epsilon^2)

plays without this requirement by progressively adapting the regularization to the observations

HAL-ENS-LYON